Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition

نویسندگان

چکیده

Adapting speaker recognition systems to new environments is a widely-used technique improve well-performing model learned from large-scale data towards task-specific small-scale scenarios. However, previous studies focus on single domain adaptation, which neglects more practical scenario where training are collected multiple acoustic domains needed in forensic Audio analysis for offers unique challenges with multi-domain due location/scenario uncertainty and diversity mismatch between reference naturalistic field recordings. It also difficult directly employ domain-specific train complex neural network architectures performance loss. Fine-tuning commonly-used method adaptation order retrain the weights initialized well-trained model. Alternatively, this study, three novel methods based adversarial training, discrepancy minimization, moment-matching approaches proposed further promote across domains. A comprehensive set of experiments conducted demonstrate that: 1) diverse do impact performance, could advance research audio forensics, 2) learns discriminative features invariant shifts domains, 3) discrepancy-minimizing achieves effective simultaneously 4) along dynamic distribution alignment significantly promotes each domain, especially LENA-field noise compared all other systems. Advancements shown here therefore helper ensure consistent operational forensics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text-independent Speaker Recognition

In this paper, text-independent speaker recognition method based on Wavelet Transform and mel-cepstrum is presented. The results of experiments point the best parameters of Wavelet Transform for speaker identification, and can be useful for design speaker identification systems. This kind method of person identification is useful in services such as banking by telephone, access authorization to...

متن کامل

Transformation enhanced multi-grained modeling for text-independent speaker recognition

We describe our formulation of transformation enhanced data modeling used to develop a multi-grained data analysis approach to text independent speaker recognition. The broad goal is to address difficulties caused by sparse training and test data. First, our development of maximum likelihood transformation based recognition with diagonally constrained Gaussian mixture models is detailed. We giv...

متن کامل

Multi-state predictive neural networks for text-independent speaker recognition

Both Hidden Markov Models and Neural Networks have already been used as production systems for speaker identification or verification. Recently [9] has shown that ergodic multi-state hidden Markov Models do not outperform one-state "hidden" Markov Models, i.e. Gaussian Mixture Models, for speaker recognition. She put in evidence that the important characteristic of these models is the total num...

متن کامل

Domain adaptation for text dependent speaker verification

Recently we have investigated the use of state-of-the-art textdependent speaker verification algorithms for user authentication and obtained satisfactory results mainly by using a fair amount of text-dependent development data from the target domain. In this work we investigate the ability to build high accuracy text-dependent systems using no data at all from the target domain. Instead of usin...

متن کامل

Speaker-specific mapping for text-independent speaker recognition

In this paper, we present the concept of speaker-specific mapping for the task of speaker recognition. The speakerspecific mapping is realized using a multilayer feedforward neural network. In the mapping approach, the aim is to capture the speaker-specific information by mapping a set of parameter vectors specific to linguistic information in the speech, to a set of parameter vectors having li...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2022

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3130975